PASCAL VOC 2007 dataset; seems like a good dataset to work because its versatile and im more familarized with it thanks to the Lab9 we previously worked on.
So first lets check if we have installed all needed libraries and then import them to our empty notebook and load the PASCAL VOC 2007 dataset.
%pip install tensorflow tensorflow-hub tensorflow-datasets matplotlib
Requirement already satisfied: tensorflow in /usr/local/lib/python3.10/dist-packages (2.17.1) Requirement already satisfied: tensorflow-hub in /usr/local/lib/python3.10/dist-packages (0.16.1) Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.10/dist-packages (4.9.7) Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.8.0) Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.4.0) Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.6.3) Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (24.3.25) Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (0.6.0) Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (0.2.0) Requirement already satisfied: h5py>=3.10.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (3.12.1) Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (18.1.1) Requirement already satisfied: ml-dtypes<0.5.0,>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (0.4.1) Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (3.4.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from tensorflow) (24.2) Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (4.25.5) Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (2.32.3) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from tensorflow) (75.1.0) Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.16.0) Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (2.5.0) Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (4.12.2) Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.16.0) Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.68.0) Requirement already satisfied: tensorboard<2.18,>=2.17 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (2.17.1) Requirement already satisfied: keras>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (3.5.0) Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (0.37.1) Requirement already satisfied: numpy<2.0.0,>=1.23.5 in /usr/local/lib/python3.10/dist-packages (from tensorflow) (1.26.4) Requirement already satisfied: tf-keras>=2.14.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-hub) (2.17.0) Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (8.1.7) Requirement already satisfied: dm-tree in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.1.8) Requirement already satisfied: immutabledict in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (4.2.1) Requirement already satisfied: promise in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.3) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (5.9.5) Requirement already satisfied: pyarrow in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (17.0.0) Requirement already satisfied: simple-parsing in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.1.6) Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.13.1) Requirement already satisfied: toml in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.10.2) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (4.66.6) Requirement already satisfied: array-record>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.5.1) Requirement already satisfied: etils>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from etils[edc,enp,epath,epy,etree]>=1.6.0; python_version < "3.11"->tensorflow-datasets) (1.10.0) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.3.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.55.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.7) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (11.0.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.2.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2) Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.0->tensorflow) (0.45.0) Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from etils[edc,enp,epath,epy,etree]>=1.6.0; python_version < "3.11"->tensorflow-datasets) (2024.10.0) Requirement already satisfied: importlib_resources in /usr/local/lib/python3.10/dist-packages (from etils[edc,enp,epath,epy,etree]>=1.6.0; python_version < "3.11"->tensorflow-datasets) (6.4.5) Requirement already satisfied: zipp in /usr/local/lib/python3.10/dist-packages (from etils[edc,enp,epath,epy,etree]>=1.6.0; python_version < "3.11"->tensorflow-datasets) (3.21.0) Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (from keras>=3.2.0->tensorflow) (13.9.4) Requirement already satisfied: namex in /usr/local/lib/python3.10/dist-packages (from keras>=3.2.0->tensorflow) (0.0.8) Requirement already satisfied: optree in /usr/local/lib/python3.10/dist-packages (from keras>=3.2.0->tensorflow) (0.13.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.0) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.21.0->tensorflow) (2024.8.30) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.18,>=2.17->tensorflow) (3.7) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.18,>=2.17->tensorflow) (0.7.2) Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.18,>=2.17->tensorflow) (3.1.3) Requirement already satisfied: docstring-parser<1.0,>=0.15 in /usr/local/lib/python3.10/dist-packages (from simple-parsing->tensorflow-datasets) (0.16) Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-metadata->tensorflow-datasets) (1.66.0) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from werkzeug>=1.0.1->tensorboard<2.18,>=2.17->tensorflow) (3.0.2) Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich->keras>=3.2.0->tensorflow) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich->keras>=3.2.0->tensorflow) (2.18.0) Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.2.0->tensorflow) (0.1.2)
After installing we need to import the libraries
# Import necessary libraries
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import cv2
from PIL import Image
import requests
from io import BytesIO
import os
from PIL import Image
print("TensorFlow version:", tf.__version__)
print("TensorFlow Hub version:", hub.__version__)
TensorFlow version: 2.17.1 TensorFlow Hub version: 0.16.1
Next is Downloading the PASCAL VOC 2007 dataset. In the code to dowload a small portion of PASCAL data, are some variables that a worthy to note.
We define a function with *load_data* to load the COCO dataset.
*tfds.load* = is a function that downloads and prepares the dataset.
We use only 1% of the training data to keep the demonstration manageable.
*shuffle_files*=True ensures that we get a random sample of the dataset.
*with_info*=True returns additional information about the dataset, which we'll use later.
REMEMBER** THERE IS AROUND 20 OBJECT CLASS RANGING FROM PEOPLE AND ANIMALS TO VEHICLES AND INDOOR ITEMS. HOWEVER BECAUSE WE ARE EXPORTING SUCH A SMALL PART OF THE DATASET THE MODEL MIGHT NOT BE ACCURATE
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
# Load a smaller dataset
def load_data(split='train'):
dataset, info = tfds.load('voc/2007', split=split, shuffle_files=True, with_info=True)
return dataset, info
# Load the train dataset and extract info
train_dataset, train_info = load_data('train[:11%]')
# Load the validation dataset
validation_dataset, validation_info = load_data('validation[:10%]')
# Get class names
class_names = train_info.features["objects"]["label"].names # Changed from ds_info to train_info
print("Class names:", class_names)
Class names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
on the clean_dataset we increased it to 512, so that the model we are using can handle the higher resolution inputs.¶
import os
import numpy as np
from PIL import Image
# Function to normalize image sizes and clean dataset
def normalize_images(image_path, target_size=(512, 512)):
"""Resize images to a target size."""
image = Image.open(image_path).convert("RGB") # Ensure all images are RGB
image = image.resize(target_size)
return np.array(image)
def clean_dataset(dataset_path, target_size=(300, 300)):
"""
Load and clean dataset by normalizing image sizes.
Ignores non-image files or corrupted files.
"""
cleaned_data = []
for root, _, files in os.walk(dataset_path):
for file in files:
try:
image_path = os.path.join(root, file)
image = normalize_images(image_path, target_size)
cleaned_data.append(image)
except Exception as e:
print(f"Error processing {file}: {e}") # Log errors without stopping
return np.array(cleaned_data)
# Specify the dataset path
dataset_path = 'voc/2007'
# Clean and normalize images
cleaned_images = clean_dataset(dataset_path)
# Print how many images were successfully processed
print(f"Cleaned {len(cleaned_images)} images")
Cleaned 0 images
from tensorflow.keras.preprocessing.image import ImageDataGenerator
data_gen = ImageDataGenerator(
rescale=1.0 / 255.0, # Normalize pixel values to [0, 1]
rotation_range=20, # Rotate images up to 20 degrees
width_shift_range=0.1, # Shift images horizontally up to 10%
height_shift_range=0.1, # Shift images vertically up to 10%
horizontal_flip=True, # Flip images horizontally
zoom_range=0.1, # Random zoom up to 10%
shear_range=0.1, # Apply shear transformation with 10% max angle
channel_shift_range=10.0 # Shift image color channels
)
# Apply augmentation to the cleaned images
def augment_data(cleaned_images):
"""
Augment the cleaned images using the defined augmentation pipeline.
"""
augmented_images = []
for image in cleaned_images:
# Expand dimensions to mimic a batch
image_batch = np.expand_dims(image, axis=0)
# Apply augmentation
for augmented_image in data_gen.flow(image_batch, batch_size=1):
augmented_images.append(augmented_image[0]) # Add augmented image
break # Ensure only one augmented version per image
return np.array(augmented_images)
Important things to note are in line 1, it takes two parameters "dataset" which gives us example images from the dataset with groundtruth boxes and "n=3" asking us how many examples we want.
Line 2 iterates takes "n" number of examples and iterates to display the example image with their corresponding bounding boxes using .take(). Example is a dictionary containing keys with the "image" for the image data and "object" being the bounding boxes.
- it displays the images using plt.imshow(image), it configures the picture size with plt.figure, and the title is displays with plt.title.
- the for loop box in example["objects"]["bbox] is going to iterate through the
bbox in the objects directory. Each bounding box is expected to be a list or array with coordinates [ymin, xmin, ymax, xmax]. The coordinates are transformed into pixel coordinates using image.shape[1] for width and image.shape[0] for height.
- the (patches.Rectangle) creates a rectangle to correspond with the box:
- (xmin * image.shape[1], ymin * image.shape[0]) calculates the top-left corner,
- (xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0] calculates the width and height of the rectangle
def display_examples(dataset, n=2): # Display 'n' examples by default
for example in dataset.take(n):
image = example["image"]
plt.figure(figsize=(10, 10))
plt.imshow(image)
plt.title("Image with Ground Truth Bounding Boxes")
# Draw ground truth boxes
for box in example["objects"]["bbox"]:
ymin, xmin, ymax, xmax = box
rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
(xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
linewidth=1, edgecolor='g', facecolor='none')
plt.gca().add_patch(rect)
plt.show()
# Display some augmented images
display_examples(train_dataset)
so right now we have class_names provides the list of class names, target_class_ids containing the IDs of the classes we are interested in, and find_images_with_classes a function to find images containing our target classes. Now we need to chose our model. I chose the example the previous lab gave us.
🚨The model is needed to identify and locate objects in images or videos by predicting bounding boxes and class labels for detected objects.🚨
So whats happening down here?
- hub.load(): This goes to TensorFlow Hub repository and downloads and loads models.
- "https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2": This is the URL of the specific model we're loading, in this case it's an SSD (Single Shot Detector) MobileNet V2 model.
- Detector: This is the variable we chose to store the model.
#Load a pre-trained object detection model
detector = hub.load("https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2")
Now we need to compare the ground truth boxes to our model's predictions. So we can hone what the model struggles with or performs well.¶
# Run Detector and Visualize
def run_detector_and_visualize(example):
image = example["image"]
ground_truth_boxes = example["objects"]["bbox"]
# Preprocess and run detection
converted_img = tf.image.convert_image_dtype(image, tf.uint8)[tf.newaxis, ...]
result = detector(converted_img)
result = {key: value.numpy() for key, value in result.items()}
# Visualize results (with ground truth for comparison)
plt.figure(figsize=(10, 7))
plt.imshow(image)
# Ground truth boxes (VOC format is [xmin, ymin, xmax, ymax])
for box in ground_truth_boxes:
ymin, xmin, ymax, xmax = box
rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
(xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
linewidth=1, edgecolor='g', facecolor='none', label='Ground Truth')
plt.gca().add_patch(rect)
# Predicted boxes
for i, score in enumerate(result['detection_scores'][0]):
if score > 0.5: # Confidence threshold
ymin, xmin, ymax, xmax = result['detection_boxes'][0][i]
class_id = int(result['detection_classes'][0][i])
#i added a label unknown so if it didn't know what it was not crash
# Handle invalid class IDs (classes outside the VOC dataset)
label = 'Unknown' # Default labe
if class_id < len(class_names):
label = class_names[class_id]
rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
(xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
linewidth=1, edgecolor='r', facecolor='none', label='Predicted')
plt.gca().add_patch(rect)
# Moved plt.text to the correct loop for the predicted box
plt.text(xmin * image.shape[1], ymin * image.shape[0] - 5, f'{label}: {score:.2f}', color='white', backgroundcolor='r')
plt.legend()
plt.show()
# take a few examples from the training set
for example in train_dataset.take(2): # Process 2 images
run_detector_and_visualize(example)
from ultralytics import YOLO
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Load the pre-trained YOLOv8 model (e.g., yolov8n for the nano model)
model = YOLO('yolov8n.pt')
Creating new Ultralytics Settings v0.0.6 file ✅ View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json' Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings. Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt'...
100%|██████████| 6.25M/6.25M [00:00<00:00, 33.6MB/s]
def run_yolov8_and_visualize(image):
# Convert image to OpenCV format
image_np = np.array(image)
# Run YOLOv8 detection
results = model.predict(image_np, verbose=False) # Run YOLOv8 detection
# Visualize image
plt.figure(figsize=(10, 7))
plt.imshow(image_np)
# Draw predicted boxes
for box in results[0].boxes: # Iterate over detected boxes
xmin, ymin, xmax, ymax = map(int, box.xyxy[0].tolist()) # Bounding box coordinates
confidence = box.conf[0] # Confidence score
class_id = int(box.cls[0]) # Class ID
# Get class name
label = model.names[class_id] if class_id in model.names else "Unknown"
# Draw the bounding box and label
plt.gca().add_patch(
patches.Rectangle(
(xmin, ymin),
xmax - xmin,
ymax - ymin,
linewidth=2,
edgecolor="red",
facecolor="none"
)
)
plt.text(
xmin,
ymin - 5,
f"{label}: {confidence:.2f}",
color="white",
backgroundcolor="red",
fontsize=8
)
plt.show()
for example in train_dataset.take(2):
image = example["image"]
run_yolov8_and_visualize(image)